Antecedents of open source software defects: A data mining approach to model formulation, validation and testing
نویسندگان
چکیده
This paper develops tests and validates a model for the antecedents of open source software (OSS) defects, using Data and Text Mining. The public archives of OSS projects are used to access historical data on over 5,000 active and mature OSS projects. Using domain knowledge and exploratory analysis, a wide range of variables is identified from the process, product, resource, and end-user characteristics of a project to ensure that the model is robust and considers all aspects of the system. Multiple Data Mining techniques are used to refine the model and data is enriched by the use of Text Mining for knowledge discovery from qualitative information. The study demonstrates the suitability of Data Mining and Text Mining for model building. Results indicate that project type, enduser activity, process quality, team size and project popularity have a significant impact on the defect density of operational OSS projects. Since many organizations, both for profit and not for profit, are beginning to use Open Source Software as an economic alternative to commercial software, these results can be used in the process of deciding what software can be reasonably maintained by an organization.
منابع مشابه
Investigating Open Source Project Success: A Data Mining Approach to Model Formulation, Validation and Testing
This paper demonstrates the use of Data Mining (DM) techniques in exploratory research. A robust model for identifying the factors that explain the success of Open Source Software (OSS) projects is created, validated and tested. The predictive modeling techniques of Logistic Regression (LR), Decision Trees (DT) and Neural Networks (NN) are used together in this analysis. Using Text Mining resul...
متن کاملRobust production scheduling in open-pit mining under uncertainty: a box counterpart approach
Open-Pit Production Scheduling (OPPS) problem focuses on determining a block sequencing and scheduling to maximize Net Present Value (NPV) of the venture under constraints. The scheduling model is critically sensitive to the economic value volatility of block, block weight, and operational capacity. In order to deal with the OPPS uncertainties, various approaches can be recommended. Robust opti...
متن کاملSpatial modelling of zonality elements based on compositional nature of geochemical data using geostatistical approach: a case study of Baghqloom area, Iran
Due to the existence of a constant sum of constraints, the geochemical data is presented as the compositional data that has a closed number system. A closed number system is a dataset that includes several variables. The summation value of variables is constant, being equal to one. By calculating the correlation coefficient of a closed number system and comparing it with an open number system, ...
متن کاملPredicting OSS Development Success: A Data Mining Approach
Open Source Software (OSS) has reached new levels of sophistication and acceptance by users and commercial software vendors. This research creates tests and validates a model for predicting successful development of OSS projects. Widely available archival data was used for OSS projects from Sourceforge. net. The data is analyzed with multiple Data Mining techniques. Initially three competing mo...
متن کاملA genetic algorithm approach for open-pit mine production scheduling
In an Open-Pit Production Scheduling (OPPS) problem, the goal is to determine the mining sequence of an orebody as a block model. In this article, linear programing formulation is used to aim this goal. OPPS problem is known as an NP-hard problem, so an exact mathematical model cannot be applied to solve in the real state. Genetic Algorithm (GA) is a well-known member of evolutionary algorithms...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Information Technology and Management
دوره 10 شماره
صفحات -
تاریخ انتشار 2009